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We study a fragmentation problem where an initial object of size x is broken into m random 
pieces provided x > xq where xo is an atomic cut-off. Subsequently the fragmentation process 
continues for each of those daughter pieces whose sizes are bigger than xo- The process stops when 
all the fragments have sizes smaller than xq. We show that the fluctuation of the total number of 
splitting events, characterized by the variance, generically undergoes a nontrivial phase transition as 
one tunes the branching number m through a critical value m — m c . For m < m c , the fluctuations 
are Gaussian where as for m > m c they are anomalously large and non-Gaussian. We apply this 
general result to analyze two different search algorithms in computer science. 
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Fragmentation is a widely studied phenomena jjj with 
applications ranging from conventional fracture of solids 
|^| and collision induced fragmentation in atomic nu- 
clei/aggregates H to seemingly unrelated fields such as 
disordered systems Q and geology ||. In this paper we 
consider a problem where an object of initial size (or 
length) x is first broken into m random pieces of sizes 
Xi = TiX with X)I=i r i = 1 provided the initial size x > Xq 
where Xq is a fixed 'atomic' threshold. At the next stage, 
each of those m pieces with sizes bigger than xq is further 
broken into m random pieces and so on. Clearly the pro- 
cess stops after a finite number of fragmentation or split- 
ting events when the sizes of all the pieces become less 
than xq . This problem and its close cousins have already 
appeared in numerous contexts including the energy cas- 
cades in turbulence || , rupture processes in earthquakes 
\7\, stock market crashes binary search algorithms 
[3 PH , stochastic fragmentation |l2f and DNA segmen- 
tation algorithms [|13|]. It therefore comes as somewhat 
of a surprise that there is a nontrivial phase transition 
in this problem as one tunes the branching number m 
through a critical value m = m c . 

In this Letter we study analytically the statistics of 
the total number of fragmentation events n(x) up till the 
end of the process as a function of the initial size x. We 
show that, while the average number of events n(x) al- 
ways grows linearly with x for large x, the asymptotic 
behavior of the variance v(x), characterizing the fluctu- 
ations, undergoes a phase transition at a critical value 



v{x) ~ { 



x m < m c 
x m > m r 



(1) 



The exponent 9 is nontrivial and increases monotonically 
with m for m > m c starting at 9(m = m c ) = 1/2 and 
the amplitude of the leading x term has log-periodic 
oscillations for m > m c . This signals unusually large 
fluctuations in n(x) for m > m c . The full distribution 
of n(x) also changes from being Gaussian for m < m c 
to non-Gaussian for m > m c . This phase transition is 



rather generic for any fragmentation problem with an 
'atomic' threshold. However the critical value m c and 
the exponent 6 are nonuniversal and depend on the dis- 
tribution function of the random fractions ?Vs. In this 
Letter we establish this generic phase transition and then 
calculate explicitly m c and 8 for two special cases with 
direct applications in computer science. 

In this fragmentation problem with a fixed lower cut-off 
xq, one first breaks the initial piece of length x provided 
x > xq into to pieces of sizes Xi = rix. The sizes of each 
of these 'daughters' are then examined. Only those pieces 
whose sizes exceed xo are considered 'active' and those 
with sizes less than xo are considered 'frozen'. Each of the 
active pieces is then subsequently broken into m pieces 
and so on. The fractions ?Vs characterizing a splitting 
event are considered to be independent from one event 
to another but are drawn each time from the same joint 
distribution function r\ m (r\,ri, . . .r m ). As the splitting 
process conserves the total size, the fractions r^'s satisfy 
the constraint Y]a—i Ti = 1. In addition, we consider the 
splitting process to be isotropic, i.e., all the m daughters 
resulting from a splitting event are statistically equiva- 
lent. This indicates that the marginal distribution of any 
one of the r^'s is independent of i and is given by, 



Vm(r, r 2 , • • -r m ) 



(2) 



We will henceforth denote the average over the whole 
history of the splitting procedure (till the end of the pro- 
cess) as ~ and the average over the ?Vs associated with 
a single splitting event as (•). The conservation law along 
with the isotropy implies that (r) = J rj\{r)r dr = 1/m. 

Clearly the total number of splitting events n(x) = 
if x < xq. On the other hand if x > xq there will be 
at least one splitting and it is easy to write a recursion 
relation for 77,(2), 



n(x) = 1 + n{rix). 



(3) 
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Using the isotropy of the splitting distribution and tak- 
ing the average over Eq. (||), we find that fx(x) = n(x) 
satisfies the recursion for x > xq, 



IJl(x) = 1 + m(/i(rx)) = 1 + 1 



dr rji{r)jj,(rx), (4) 



where the lower limit in the above integral comes from 
the condition n(x) = for x < xq. Without any loss 
of generality we set xq = 1, i.e., we measure all sizes in 
units of the atomic size. Since 1 < x < oo in Eq. (^), it 
is convenient to make a change of variable x = e a so that 
< a < oo and write /x(e") = F(a). The resulting equa- 
tion for F(a) is solved by taking the Laplace transform 
of Eq. (g) and one finds that F(s) = J °° da F(a)e- sa is 
given by 



F(s) = 



1 



>(»)]' 



(5) 



where w(s) — (r s ) — j Q dr r/i(r) r s . Assuming F(s) 
has simple poles at s — Afc, the Laplace transform in 
Eq. (||) can be inverted to obtain (i(x) = a + J2k a k xXk 
with ciq = 1/(1 — m) (coming from the pole at s = 0) 
and ak = — l/[m\kw'(\k)]- From the conservation law 
(r) = 1/m, one finds that s = 1 is always a pole of F(s). 
Besides, since < r < 1, the pole at s = 1 is also the 
one with the largest real part and hence will dominate 
the large x behavior of fi(x). Let A and A* denote the 
pair of complex conjugate poles with the next largest real 
part. Then keeping only the leading corrections to the 
asymptotic behavior one finds 



fi(x) s» a\x + a,2X X + a^x* 



(0) 



where a\ = — l/[m«/(l)] = — l/[m Jq 1 dr r)i(r)r ln(r)]. 

We now turn to the variance v (x) of the total number 
of splittings n(x): 



v[x) = (n(x) — n(x)Y 



(7) 



By squaring Eq. (|^) and after some straightforward al- 
gebra we find the recursion relation 

v {x) = f(x) + m / dr rjx{r)v(rx), (8) 

Jl/x 

where f(x) = ((J2iLi Mnx) - (n{rx))]f) . Once again 
the change of variable x — e a followed by a subse- 
quent Laplace transform with respect to a, 9(s) — 
J °° dav(e a )e~ sa yields, 



v{s) 



s [1 — mw(s)] 



(9) 



Using the asymptotic expression of (J,(x) from Eq. (^|) in 
the expression for f(x) one finds that the leading term 
of f{x) for large x is given by 



f{x) w hx 2A + b 2 x 2 * +b 3 x 



„(A+A*) 



(10) 



where the bj's are constants. These behaviors indicate 
that f(s) has poles at s = 2A, s = 2A* and s = A + A*. 
Thus when Re{\) < 1/2, these poles occur to the left of 
s = 1 in the complex s plane. From Eq. (^) it follows 
that the asymptotic large x behavior of v{x) will then be 
controlled by the s — 1 pole arising from the denomina- 
tor [1 — mw(s)] and v [x) ~ x for large x. On the other 
hand when Re(X) > 1/2, the dominant poles governing 
the large x behavior are the three poles of f(s) with real 
part 2i?e(A) > 1. Hence in that case, v(x) ~ x 2e where 
9 = Re(X). Note also that for Re(X) > 1/2, the ampli- 
tude of the leading term x 20 in v{x) will have log-periodic 
oscillations due to the nonzero imaginary parts of the 
poles A and A*. This phase transition will always occur 
whenever one can tune the pole A continuously through 
the critical value Re(\) = 1/2. In the following two ex- 
amples we show explicitly that this can be achieved, in a 
natural way, by tuning the branching number m. 

Sequence :{ 8,6,9,2, 1,5,3,4,7, 10} 

3-ary Search Tree 

a. 




6 nodes 



Interval Splitting (6 splittings) 
123456789 10 



lb 



• — • 



I c 



+ d 



I e 
• — 



if 



FIG. 1. Top: The construction of the 3-ary tree for a se- 
quence of numbers between 1 and 10. Bottom: The induced 
random interval splitting. The nodes and corresponding split- 
tings are labelled a,b,c,d,e and f. An interval can only split if 
it contains a •. 
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The m-ary search tree: It is well known that one 
of the most efficient ways to sort the incoming data to 
a computer is to organize the data on a tree Con- 
sider the sorting of an incoming data string consisting of 
N distinct elements labelled by the sequence 1, 2, ■ ■ ■ N. 
Consider a particular random sequence of arrival of these 
N elements (see Fig. 1). An m-ary search tree stores this 
sequence on a growing tree structure where each node of 
the tree can contain at most (m — 1) data points PJTl||. 
A node if filled branches into m leaves. The first (to — 1) 
elements are stored in the root of the tree in an ordered 
sequence wi < w-2 ■■ ■ < W m -\. Any subsequent element 
must belong to one of to sets of numbers A\ = [l,u>i), 
Ai = (wi, lUj+i) with 1 < i < to— 2 and A m = (w m -i, N]. 
Associated with each of these sets or intervals we asso- 
ciate a leaf of the tree leading to a new node. A new data 
point w, arriving subsequently, is sent down the leaf cor- 
responding to the set A4 if w G Ai and is stored in a 
daughter node at the base of that leaf. Once a daughter 
node is filled with to — 1 numbers it in turn gives rise to 
to new leaves and so on. An example for a 3-ary search 
tree with N = 10 is shown in Fig. ([!]). 

Each sequence of the incoming data will give rise to a 
different TO-ary tree configuration. If the incoming data 
is random, all the trees occur with equal probability. It 
is easy to see that the total number of occupied nodes 
M (each containing at least one element) is a random 
variable as it varies from one tree configuration to an- 
other, except for m = 2 where M — N. The statistics 
of M was recently studied by computer scientists using 
rather involved combinatorial analysis and it was found 
that while M ~ N for large N, the variance v ~ N for 
to < 26 and as ~ N 29 for to > 26 |jl5). We show below 
that this strange result is just a special case of the general 
phase transition in the fragmentation problem discussed 
here. 

The construction of the m-ary search tree can be 
mapped exactly onto the splitting of the interval [1,N] 
||[ll]]. It is easy to see that the incoming elements 
Wi, u>2 ■ ■ ■ w m _i split the initial interval into to parts Aj. 
If all the N\ possible sequences arrive with equal prob- 
ability then the points W\, w 2 ■ ■ -w m -i are distributed 
uniformly on [1, N] (these are the numbers stored in the 
first node). We split the interval [1,N] into to subin- 
tervals corresponding to the Aj's introduced above. If a 
subinterval Aj is empty (i.e. has no • in Fig. (2)) then 
no data points can go down the corresponding leaf and 
hence such an interval (of length < 2) will not split any 
further. If the subset Ai contains only one •, the arrival 
of the corresponding single data point still splits the in- 
terval into to parts (some of the intervals so created may 
be of length 0). This corresponds to the atomic threshold 
xq = 2 in our general problem and x = N/2 corresponds 
to the initial size in units of the atomic size. 

The crucial point is that the number of occupied nodes 
M in the TO-ary search tree is identical to the number 



of splittings n(x = N/2) in this fragmentation prob- 
lem. For large N one can pass to a continuum limit 
and use the known marginal probability density function 
Vi( r ) = (rn — 1)(1 — r) m ~ 2 ]l0| , p| for the continuum in- 
terval splitting problem in our general formula. We get 
M = fi(x = N/2) m aiN/2 for large N where a x = 
l/E™ =2 l/fc]. Also w(s) = (r s ) = (m-l)B(s+l,m-l) 
where B(m,n) is the standard Beta function. There- 
fore the poles of F(s) in Eq. (|5|) occur at the roots of 
the equation to(to — l)B(s + l,m — 1) = 1. It is easy 
to check using Mathematica that one can arrive at the 
critical condition Re(X) — 1/2 by tuning to through the 
value to = to c « 26.0461 • • •. Therefore, from our general 
theory, we find that the variance v ~ N for m < m c and 
v ~ N 29 for to > to c . The exponent 9 — Re(X) where A 
is the root of m(m — l)B(s + 1, to — 1) = 1 that is closest 
(to the left) to s = 1 when m > m c . 
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FIG. 2. An example of the splitting of a rectangle into four 
daughter rectangles about the point w = [ZiUi, Z2Ua]- The 
same process is continued on each daughter until the area of 
the daughter becomes less than xq. 

Cuboid Splitting: In the previous problem we have 
considered the sorting of a data string where each ele- 
ment is a scalar. A natural generalization is when each 
element is a D-dimensional vector w whose fc-th compo- 
nent Wk € [0, Ik] for 1 < k < D. The first element of the 
data string is then assigned to the point w in the cuboid 
of edge lengths Ik- If the first element is random, then 
its components Wk — Wfc^fc, where the uu are independent 
random variables uniformly distributed on [0,1]. Once 
this first element is stored, it splits the original cuboid 
into 2 D sub-cuboids obtained by drawing D lines perpen- 
dicular to each of the faces of the cuboid (see Fig. 2). 
When the second vector arrives, one compares its compo- 
nents with that of the first element and places it in one of 
the 2 D sub-cuboids (and thereby splits that sub-cuboid) 
and the process continues. After each splitting event, the 
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dimensions of the m = 2 D new sub-cuboids can be repre- 
sented by l' k (a) = l k u k (l + a k )/2 + l k (l - u k )(l - a k )/2, 
where the a k are Ising spins. An example with D = 2 is 
shown in Fig(^). 

The volume of any of the sub-cuboids upon splitting 
the cuboid of volume x is x' — xr(a), where r(a) — 
nf=i + ff fc)/2 + (1 - u fc )(l - a k )/2). Hence in this 

problem one has m = 2 D and the marginal distribution 
of a given r(a) can be shown to be 



0.03 



ln(r)] 



(D-iy. 



< r < 1. 



(11) 



From Eq.(^|) with this marginal distribution one finds 
that 



F(s) = 



1 - 



2 D 



(12) 



One then finds fi(x) » 2x/D for large x. The function 
F(s) has a total of (D + 1) poles: one at s = and the 
others at s = -1 + 2e 2wm /' D with n = 0, 1, . . . (D - 1). 
The poles closest to the left of s = 1 are the complex con- 
jugate pair A = -1 + 2e 2 "/ jD and A* = -1 + 2e~ 2m / D . 
Thus Re(X) = — 1 + 2cos(27r/Z?). From our general the- 
ory, it follows that by tuning m or equivalently D, it is 
possible to encounter the critical point Re(X) = 1/2 at 
D = D c = tt/ sin" 1 (l/2\/2) = 8.69 . . .. Hence the vari- 
ance u(x) ~ a; for D < D c , and for D > D c , v{x) ~ x 29 
where = i?e(A) = 2cos(2tt/D) - 1. 

We have verified the above predictions by numerically 
carrying out the splitting procedure on a large number 
of samples with atomic cut-off xq = 1. The analytical 
predictions for the mean and the variance are well veri- 
fied though an accurate measurement of the exponent 
is difficult due to statistical fluctuations and finite size 
corrections. We have also measured the histogram of the 
number of splittings. For D < D c this distribution is 
Gaussian, however for D > D c the distribution becomes 
skewed towards large values of n(x) having an anoma- 
lous tail. Shown in Fig. ([|) is the distribution measured 
for D = 8 and D = 10. The difference is clearly visible. 
The non-Gaussian behavior is also visible for the case 
D = 9 but less pronounced as Re(\) is quite close to 
1/2. While we can rigorously prove that the distribution 
is indeed Gaussian in the sub-critical regime, we have not 
been able to calculate the full distribution in the super- 
critical regime. Qualitatively it is clear however that as 
D increases the volume of the cuboid becomes more con- 
centrated about its surface and hence the splitting point 
w for large D is generically closer to the surfaces. This 
means that the splitting procedure will tend to cut the 
cuboid into more unequal pieces than at lower dimen- 
sions, a mixture of blocks of larger volume and slices of 
smaller volume. It is thus the blocks which are sliced 
rather than split in their middle which contribute to the 
long tail in the distribution of n(x). 
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n(x) (with x= 1 000) 
FIG. 3. The distribution p(n(x)) of the number of split- 
tings of a cuboid of original volume x = 1000 for D — 8 (filled 
circles) and for D = 10 (filled squares). The distribution is 
Gaussian for D = 8, but has a non- Gaussian skewness for 
D — 10. The histogram was formed by numerically splitting 
5 x 10 samples in each case. 



In conclusion we have shown that a fragmentation pro- 
cess with an atomic threshold can undergo a nontrivial 
phase transition in the fluctuations of the number of split- 
tings at a critical value of the branching number m. The 
calculation of the full probability distribution of the num- 
ber of splittings remains a challenging unsolved problem. 
We have provided applications of our general results in 
two computer science problems. The mechanism of this 
transition is remarkably simple and therefore one expects 
it to be rather generic with broad applications since many 
random processes can be mapped to the type of fragmen- 
tation model considered here. 
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